Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations
نویسندگان
چکیده
This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not only the power coefficient of the norm but also weights for sub-costs so that the integrated cost corresponds better to perceptual scores determined by perceptual experiments. As a result, it is clarified that the correspondence of the cost can be improved to a greater degree by optimizing both the weights and the power coefficient than by optimizing either the weights or the power coefficient. However, it is also clarified that the correspondence is insufficient after optimizing the integrated cost function.
منابع مشابه
Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis
ABSTRACT In segment selection for concatenative Text-to-Speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic spee...
متن کاملMutual-information based segment pre-selection in concatenative text-to-speech
Corpus based Concatenative Text-To-Speech (CTTS) systems have been proven a successful method to produce good voice quality speech. However, It requires a large inventory of synthesis segments and complex search algorithms, which sometimes hinder the usability of CTTS. Segment pre-selection targets to prune the candidate segments to achieve the best possible synthesis quality within a pre-defin...
متن کاملAn evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis
In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maxi...
متن کاملTarget Cost of F0 Based on Pol Concatenative Speec
This paper proposes a target cost function for F0 based on polynomial regression for use in concatenative speech synthesis. Polynomial regression is used to express the time series of F0 continuously, and remove effects of microprosody. We conducted a perceptual experiment and confirmed that the proposed function provides a higher correlation with perceptual scores than does the conventionally ...
متن کاملAn auditory-based distortion measure with application to concatenative speech synthesis
This study presents a new auditory-based distance measure with application to concatenative speech synthesis. This measure employs the Carney auditory model to produce a feature vector related to auditory perception. For concatenative synthesis, the new measure is employed to assess perceived discontinuities at segment transitions. Evaluations using a restricted data base environment show that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003